PGCOPYDB COPY(1) | pgcopydb | PGCOPYDB COPY(1) |
pgcopydb copy - pgcopydb copy
pgcopydb copy - Implement the data section of the database copy
This command prefixes the following sub-commands:
pgcopydb copy: Implement the data section of the database copy Available commands: pgcopydb copy db Copy an entire database from source to target roles Copy the roles from the source instance to the target instance extensions Copy the extensions from the source instance to the target instance schema Copy the database schema from source to target data Copy the data section from source to target table-data Copy the data from all tables in database from source to target blobs Copy the blob data from the source database to the target sequences Copy the current value from all sequences in database from source to target indexes Create all the indexes found in the source database in the target constraints Create all the constraints found in the source database in the target
Those commands implement a part of the whole database copy operation as detailed in section pgcopydb clone. Only use those commands to debug a specific part, or because you know that you just want to implement that step.
WARNING:
This mode of operations is useful for debugging and advanced use cases only.
pgcopydb copy db - Copy an entire database from source to target
The command pgcopydb copy db is an alias for pgcopydb clone. See also pgcopydb clone.
pgcopydb copy db: Copy an entire database from source to target usage: pgcopydb copy db --source ... --target ... [ --table-jobs ... --index-jobs ... ] --source Postgres URI to the source database --target Postgres URI to the target database --dir Work directory to use --table-jobs Number of concurrent COPY jobs to run --index-jobs Number of concurrent CREATE INDEX jobs to run --restore-jobs Number of concurrent jobs for pg_restore --drop-if-exists On the target database, clean-up from a previous run first --roles Also copy roles found on source to target --no-owner Do not set ownership of objects to match the original database --no-acl Prevent restoration of access privileges (grant/revoke commands). --no-comments Do not output commands to restore comments --skip-large-objects Skip copying large objects (blobs) --filters <filename> Use the filters defined in <filename> --fail-fast Abort early in case of error --restart Allow restarting when temp files exist already --resume Allow resuming operations after a failure --not-consistent Allow taking a new snapshot on the source database --snapshot Use snapshot obtained with pg_export_snapshot
pgcopydb copy roles - Copy the roles from the source instance to the target instance
The command pgcopydb copy roles implements both pgcopydb dump roles and then pgcopydb restore roles.
pgcopydb copy roles: Copy the roles from the source instance to the target instance usage: pgcopydb copy roles --source ... --target ... --source Postgres URI to the source database --target Postgres URI to the target database --dir Work directory to use --no-role-passwords Do not dump passwords for roles
NOTE:
When a role already exists on the target database, its restoring is entirely skipped, which includes skipping both the CREATE ROLE and the ALTER ROLE commands produced by pg_dumpall --roles-only.
The pg_dumpall --roles-only is used to fetch the list of roles from the source database, and this command includes support for passwords. As a result, this operation requires the superuser privileges.
pgcopydb copy extensions - Copy the extensions from the source instance to the target instance
The command pgcopydb copy extensions gets a list of the extensions installed on the source database, and for each of them run the SQL command CREATE EXTENSION IF NOT EXISTS.
pgcopydb copy extensions: Copy the extensions from the source instance to the target instance usage: pgcopydb copy extensions --source ... --target ... --source Postgres URI to the source database --target Postgres URI to the target database --dir Work directory to use --requirements List extensions requirements
When copying extensions, this command also takes care of copying any Extension Configuration Tables user-data to the target database.
pgcopydb copy schema - Copy the database schema from source to target
The command pgcopydb copy schema implements the schema only section of the clone steps.
pgcopydb copy schema: Copy the database schema from source to target usage: pgcopydb copy schema --source ... --target ... [ --table-jobs ... --index-jobs ... ] --source Postgres URI to the source database --target Postgres URI to the target database --dir Work directory to use --filters <filename> Use the filters defined in <filename> --restart Allow restarting when temp files exist already --resume Allow resuming operations after a failure --not-consistent Allow taking a new snapshot on the source database --snapshot Use snapshot obtained with pg_export_snapshot
pgcopydb copy data - Copy the data section from source to target
The command pgcopydb copy data implements the data section of the clone steps.
pgcopydb copy data: Copy the data section from source to target usage: pgcopydb copy data --source ... --target ... [ --table-jobs ... --index-jobs ... ] --source Postgres URI to the source database --target Postgres URI to the target database --dir Work directory to use --table-jobs Number of concurrent COPY jobs to run --index-jobs Number of concurrent CREATE INDEX jobs to run --restore-jobs Number of concurrent jobs for pg_restore --skip-large-objects Skip copying large objects (blobs) --filters <filename> Use the filters defined in <filename> --restart Allow restarting when temp files exist already --resume Allow resuming operations after a failure --not-consistent Allow taking a new snapshot on the source database --snapshot Use snapshot obtained with pg_export_snapshot
NOTE:
The pgcopydb copy data command implements the following steps:
$ pgcopydb copy table-data $ pgcopydb copy blobs $ pgcopydb copy indexes $ pgcopydb copy constraints $ pgcopydb copy sequences $ vacuumdb -z
Those steps are actually done concurrently to one another when that's possible, in the same way as the main command pgcopydb clone would. The only difference is that the pgcopydb clone command also prepares and finishes the schema parts of the operations (pre-data, then post-data), which the pgcopydb copy data command ignores.
pgcopydb copy table-data - Copy the data from all tables in database from source to target
The command pgcopydb copy table-data fetches the list of tables from the source database and runs a COPY TO command on the source database and sends the result to the target database using a COPY FROM command directly, avoiding disks entirely.
pgcopydb copy table-data: Copy the data from all tables in database from source to target usage: pgcopydb copy table-data --source ... --target ... [ --table-jobs ... --index-jobs ... ] --source Postgres URI to the source database --target Postgres URI to the target database --dir Work directory to use --table-jobs Number of concurrent COPY jobs to run --filters <filename> Use the filters defined in <filename> --restart Allow restarting when temp files exist already --resume Allow resuming operations after a failure --not-consistent Allow taking a new snapshot on the source database --snapshot Use snapshot obtained with pg_export_snapshot
pgcopydb copy blobs - Copy the blob data from the source database to the target
The command pgcopydb copy blobs fetches list of large objects (aka blobs) from the source database and copies their data parts to the target database. By default the command assumes that the large objects metadata have already been taken care of, because of the behaviour of pg_dump --section=pre-data.
pgcopydb copy blobs: Copy the blob data from the source database to the target usage: pgcopydb copy blobs --source ... --target ... [ --table-jobs ... --index-jobs ... ] --source Postgres URI to the source database --target Postgres URI to the target database --dir Work directory to use --large-objects-jobs Number of concurrent Large Objects jobs to run --drop-if-exists On the target database, drop and create large objects --restart Allow restarting when temp files exist already --resume Allow resuming operations after a failure --not-consistent Allow taking a new snapshot on the source database --snapshot Use snapshot obtained with pg_export_snapshot
pgcopydb copy sequences - Copy the current value from all sequences in database from source to target
The command pgcopydb copy sequences fetches the list of sequences from the source database, then for each sequence fetches the last_value and is_called properties the same way pg_dump would on the source database, and then for each sequence call pg_catalog.setval() on the target database.
pgcopydb copy sequences: Copy the current value from all sequences in database from source to target usage: pgcopydb copy sequences --source ... --target ... [ --table-jobs ... --index-jobs ... ] --source Postgres URI to the source database --target Postgres URI to the target database --dir Work directory to use --filters <filename> Use the filters defined in <filename> --restart Allow restarting when temp files exist already --resume Allow resuming operations after a failure --not-consistent Allow taking a new snapshot on the source database --snapshot Use snapshot obtained with pg_export_snapshot
pgcopydb copy indexes - Create all the indexes found in the source database in the target
The command pgcopydb copy indexes fetches the list of indexes from the source database and runs each index CREATE INDEX statement on the target database. The statements for the index definitions are modified to include IF NOT EXISTS and allow for skipping indexes that already exist on the target database.
pgcopydb copy indexes: Create all the indexes found in the source database in the target usage: pgcopydb copy indexes --source ... --target ... [ --table-jobs ... --index-jobs ... ] --source Postgres URI to the source database --target Postgres URI to the target database --dir Work directory to use --index-jobs Number of concurrent CREATE INDEX jobs to run --restore-jobs Number of concurrent jobs for pg_restore --filters <filename> Use the filters defined in <filename> --restart Allow restarting when temp files exist already --resume Allow resuming operations after a failure --not-consistent Allow taking a new snapshot on the source database
pgcopydb copy constraints - Create all the constraints found in the source database in the target
The command pgcopydb copy constraints fetches the list of indexes from the source database and runs each index ALTER TABLE ... ADD CONSTRAINT ... USING INDEX statement on the target database.
The indexes must already exist, and the command will fail if any constraint is found existing already on the target database.
pgcopydb copy constraints: Create all the constraints found in the source database in the target usage: pgcopydb copy constraints --source ... --target ... [ --table-jobs ... --index-jobs ... ] --source Postgres URI to the source database --target Postgres URI to the target database --dir Work directory to use --filters <filename> Use the filters defined in <filename> --restart Allow restarting when temp files exist already --resume Allow resuming operations after a failure --not-consistent Allow taking a new snapshot on the source database
These commands allow implementing a specific step of the pgcopydb operations at a time. It's useful mainly for debugging purposes, though some advanced and creative usage can be made from the commands.
The target schema is not created, so it needs to have been taken care of first. It is possible to use the commands pgcopydb dump schema and then pgcopydb restore pre-data to prepare your target database.
To implement the same operations as a pgcopydb clone command would, use the following recipe:
$ export PGCOPYDB_SOURCE_PGURI="postgres://user@source/dbname" $ export PGCOPYDB_TARGET_PGURI="postgres://user@target/dbname" $ pgcopydb dump schema $ pgcopydb restore pre-data --resume --not-consistent $ pgcopydb copy table-data --resume --not-consistent $ pgcopydb copy sequences --resume --not-consistent $ pgcopydb copy indexes --resume --not-consistent $ pgcopydb copy constraints --resume --not-consistent $ vacuumdb -z $ pgcopydb restore post-data --resume --not-consistent
The main pgcopydb clone is still better at concurrency than doing those steps manually, as it will create the indexes for any given table as soon as the table-data section is finished, without having to wait until the last table-data has been copied over. Same applies to constraints, and then vacuum analyze.
The following options are available to pgcopydb copy sub-commands:
This limit only applies to the COPY operations, more sub-processes will be running at the same time that this limit while the CREATE INDEX operations are in progress, though then the processes are only waiting for the target Postgres instance to do all the work.
In that case, the --restart option can be used to allow pgcopydb to delete traces from a previous run.
When resuming activity from a previous run, table data that was fully copied over to the target server is not sent again. Table data that was interrupted during the COPY has to be started from scratch even when using --resume: the COPY command in Postgres is transactional and was rolled back.
Same reasonning applies to the CREATE INDEX commands and ALTER TABLE commands that pgcopydb issues, those commands are skipped on a --resume run only if known to have run through to completion on the previous one.
Finally, using --resume requires the use of --not-consistent.
Per the Postgres documentation about pg_export_snapshot:
Now, when the pgcopydb process was interrupted (or crashed) on a previous run, it is possible to resume operations, but the snapshot that was exported does not exists anymore. The pgcopydb command can only resume operations with a new snapshot, and thus can not ensure consistency of the whole data set, because each run is now using their own snapshot.
PGCOPYDB_SOURCE_PGURI
PGCOPYDB_TARGET_PGURI
PGCOPYDB_TABLE_JOBS
PGCOPYDB_INDEX_JOBS
PGCOPYDB_RESTORE_JOBS
PGCOPYDB_LARGE_OBJECTS_JOBS
PGCOPYDB_SPLIT_TABLES_LARGER_THAN
When --split-tables-larger-than is ommitted from the command line, then this environment variable is used.
PGCOPYDB_DROP_IF_EXISTS
PGCOPYDB_SNAPSHOT
TMPDIR
Let's export the Postgres databases connection strings to make it easy to re-use them all along:
$ export PGCOPYDB_SOURCE_PGURI=postgres://pagila:0wn3d@source/pagila $ export PGCOPYDB_TARGET_PGURI=postgres://pagila:0wn3d@target/pagila
Now, first dump the schema:
$ pgcopydb dump schema 14:28:50 22 INFO Running pgcopydb version 0.13.38.g22e6544.dirty from "/usr/local/bin/pgcopydb" 14:28:50 22 INFO Dumping database from "postgres://pagila@source/pagila?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60" 14:28:50 22 INFO Dumping database into directory "/tmp/pgcopydb" 14:28:50 22 INFO Using pg_dump for Postgres "16.1" at "/usr/bin/pg_dumpall" 14:28:50 22 INFO Exported snapshot "00000003-00000022-1" from the source database 14:28:50 22 INFO /usr/bin/pg_dump -Fc --snapshot 00000003-00000022-1 --section pre-data --file /tmp/pgcopydb/schema/pre.dump 'postgres://pagila@source/pagila?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60' 14:28:51 22 INFO /usr/bin/pg_dump -Fc --snapshot 00000003-00000022-1 --section post-data --file /tmp/pgcopydb/schema/post.dump 'postgres://pagila@source/pagila?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60'
Now restore the pre-data schema on the target database, cleaning up the already existing objects if any, which allows running this test scenario again and again. It might not be what you want to do in your production target instance though!
$ PGCOPYDB_DROP_IF_EXISTS=on pgcopydb restore pre-data --no-owner --resume --not-consistent 14:28:51 26 INFO Running pgcopydb version 0.13.38.g22e6544.dirty from "/usr/local/bin/pgcopydb" 14:28:51 26 INFO Schema dump for pre-data and post-data section have been done 14:28:51 26 INFO Restoring database from existing files at "/tmp/pgcopydb" 14:28:51 26 INFO Using pg_restore for Postgres "16.1" at "/usr/bin/pg_restore" 14:28:51 26 INFO [TARGET] Restoring database into "postgres://pagila@target/pagila?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60" 14:28:51 26 INFO Drop tables on the target database, per --drop-if-exists 14:28:51 26 INFO No tables to migrate, skipping drop tables on the target database 14:28:51 26 INFO /usr/bin/pg_restore --dbname 'postgres://pagila@target/pagila?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60' --single-transaction --clean --
Then copy the data over:
$ pgcopydb copy table-data --resume --not-consistent 14:28:52 30 INFO Running pgcopydb version 0.13.38.g22e6544.dirty from "/usr/local/bin/pgcopydb" 14:28:52 30 INFO [SOURCE] Copying database from "postgres://pagila@source/pagila?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60" 14:28:52 30 INFO [TARGET] Copying database into "postgres://pagila@target/pagila?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60" 14:28:52 30 INFO Schema dump for pre-data and post-data section have been done 14:28:52 30 INFO Pre-data schema has been restored on the target instance 14:28:52 30 INFO Copy data from source to target in sub-processes ... Step Connection Duration Transfer Concurrency -------------------------------------------------- ---------- ---------- ---------- ------------ Dump Schema source 0ms 1 Catalog Queries (table ordering, filtering, etc) source 0ms 1 Prepare Schema target 0ms 1 COPY, INDEX, CONSTRAINTS, VACUUM (wall clock) both 0ms 4 + 8 COPY (cumulative) both 1s671 2955 kB 4 Large Objects (cumulative) both 4 CREATE INDEX, CONSTRAINTS (cumulative) target 0ms 4 Finalize Schema target 0ms 1 -------------------------------------------------- ---------- ---------- ---------- ------------ Total Wall Clock Duration both 753ms 4 + 8 -------------------------------------------------- ---------- ---------- ---------- ------------
And now create the indexes on the target database, using the index definitions from the source database:
$ pgcopydb copy indexes --resume --not-consistent 14:28:53 47 INFO Running pgcopydb version 0.13.38.g22e6544.dirty from "/usr/local/bin/pgcopydb" 14:28:53 47 INFO [SOURCE] Copying database from "postgres://pagila@source/pagila?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60" 14:28:53 47 INFO [TARGET] Copying database into "postgres://pagila@target/pagila?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60" 14:28:53 47 INFO Schema dump for pre-data and post-data section have been done 14:28:53 47 INFO Pre-data schema has been restored on the target instance 14:28:53 47 INFO All the table data has been copied to the target instance 14:28:53 47 INFO All the indexes have been copied to the target instance 14:28:53 47 INFO Fetched information for 54 indexes 14:28:53 47 INFO Creating 54 indexes in the target database using 4 processes Step Connection Duration Transfer Concurrency -------------------------------------------------- ---------- ---------- ---------- ------------ Dump Schema source 0ms 1 Catalog Queries (table ordering, filtering, etc) source 0ms 1 Prepare Schema target 0ms 1 COPY, INDEX, CONSTRAINTS, VACUUM (wall clock) both 0ms 4 + 8 COPY (cumulative) both 0ms 0 B 4 Large Objects (cumulative) both 4 CREATE INDEX, CONSTRAINTS (cumulative) target 0ms 4 Finalize Schema target 0ms 1 -------------------------------------------------- ---------- ---------- ---------- ------------ Total Wall Clock Duration both 696ms 4 + 8 -------------------------------------------------- ---------- ---------- ---------- ------------
Now re-create the constraints (primary key, unique constraints) from the source database schema into the target database:
$ pgcopydb copy constraints --resume --not-consistent 14:28:54 53 INFO Running pgcopydb version 0.13.38.g22e6544.dirty from "/usr/local/bin/pgcopydb" 14:28:54 53 INFO [SOURCE] Copying database from "postgres://pagila@source/pagila?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60" 14:28:54 53 INFO [TARGET] Copying database into "postgres://pagila@target/pagila?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60" 14:28:54 53 INFO Schema dump for pre-data and post-data section have been done 14:28:54 53 INFO Pre-data schema has been restored on the target instance 14:28:54 53 INFO All the table data has been copied to the target instance 14:28:54 53 INFO All the indexes have been copied to the target instance 14:28:54 53 INFO Create constraints 14:28:54 53 INFO Fetched information for 54 indexes 14:28:54 53 INFO Creating 54 indexes in the target database using 4 processes Step Connection Duration Transfer Concurrency -------------------------------------------------- ---------- ---------- ---------- ------------ Dump Schema source 0ms 1 Catalog Queries (table ordering, filtering, etc) source 0ms 1 Prepare Schema target 0ms 1 COPY, INDEX, CONSTRAINTS, VACUUM (wall clock) both 0ms 4 + 8 COPY (cumulative) both 0ms 0 B 4 Large Objects (cumulative) both 4 CREATE INDEX, CONSTRAINTS (cumulative) target 0ms 4 Finalize Schema target 0ms 1 -------------------------------------------------- ---------- ---------- ---------- ------------ Total Wall Clock Duration both 283ms 4 + 8 -------------------------------------------------- ---------- ---------- ---------- ------------
The next step is a VACUUM ANALYZE on each table that's been just filled-in with the data, and for that we can just use the vacuumdb command from Postgres:
$ vacuumdb --analyze --dbname "$PGCOPYDB_TARGET_PGURI" --jobs 4 vacuumdb: vacuuming database "pagila"
Finally we can restore the post-data section of the schema:
$ pgcopydb restore post-data --resume --not-consistent 14:28:54 60 INFO Running pgcopydb version 0.13.38.g22e6544.dirty from "/usr/local/bin/pgcopydb" 14:28:54 60 INFO Schema dump for pre-data and post-data section have been done 14:28:54 60 INFO Pre-data schema has been restored on the target instance 14:28:54 60 INFO All the table data has been copied to the target instance 14:28:54 60 INFO All the indexes have been copied to the target instance 14:28:54 60 INFO Restoring database from existing files at "/tmp/pgcopydb" 14:28:54 60 INFO Using pg_restore for Postgres "16.1" at "/usr/bin/pg_restore" 14:28:54 60 INFO [TARGET] Restoring database into "postgres://pagila@target/pagila?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60" 14:28:55 60 INFO /usr/bin/pg_restore --dbname 'postgres://pagila@target/pagila?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60' --single-transaction --use-list /tmp/pgcopydb/schema/post-filtered.list /tmp/pgcopydb/schema/post.dump
Dimitri Fontaine
2022-2024, Dimitri Fontaine
January 10, 2024 | 0.15 |